Computation Model: Synchronous Data Flow
All systems with computation model: Synchronous Data Flow
Systems (17)
AWS Trainium
f(x) = High-throughput transformer and large-scale neural network training across sparse and dense workloads on EC2 Trn1 clusters.
AWS custom training chip powering EC2 Trn1 instances with high throughput, supporting InfiniBand fabric for massive multi-node synchronization while running dense and sparse machine learning workloads...
Google TPU v1
f(x) = AI accelerator
Google's first TPU, announced in 2016, ties a large 256×256 systolic array built for dense matrix multiplies to local weight memory so inference workloads across Google data centers run deterministica...
Google TPU v2
f(x) = AI training and inference acceleration
Google's second-generation TPU v2 is a datacenter-scale AI accelerator built around large systolic arrays, high-bandwidth memory, and bfloat16 matrix units, forming Cloud TPU v2 pods to deliver high-t...
Google TPU v3
f(x) = AI training and inference acceleration
Third-generation Google TPU pairs float32/16 matrix multiply arrays with HBM2 and Cloud TPU v3 pods contain 8x more TPU chips than the previous generation, delivering massive training and inference ac...
Google TPU v4
f(x) = dense matrix multiply and transformer attention pipelines
Google TPU v4 is the latest pod-scale accelerator from Google that deterministically realizes dense linear algebra and transformer attention via custom systolic arrays. Each TPU v4 die pairs stacked H...
Google TPU v5
f(x) = High-throughput tensor acceleration for deep learning training and inference
Google's fifth-generation TPU (v5) is a datacenter AI accelerator optimized for massive matrix multiplies; each chip exposes more matrix units than v4, and when assembled into TPU v5 pods it delivers ...
Groq Tensor Streaming Processor
f(x) = deterministic single-cycle tensor streaming execution for deep learning inference
Groq Tensor Streaming Processor delivers deterministic single-cycle tensor execution within Groq hardware so ML inference workloads observe predictable latency in massive pipelined flows.
Intel Xe-HPC
f(x) = Dense HPC GPU acceleration for AI training, scientific simulation, and matrix algebra
Ponte Vecchio GPUs combine HBM2e stacks, AVX-512 adapted cores, and a tile-based Intel 7/4 process optimized for HPC tiles, locking thousands of wide SIMT lanes per tile and coordinating them through ...
Lightmatter Passage
f(x) = photonic inference
Lightmatter Passage optical AI accelerator uses photonic inference and light-based matrix multiplies to drive an optical dataflow across a waveguide matrix engine.
Luminous Computing
f(x) = photonic logic for AI
Luminous Computing centers on photonic logic for AI, building coherent-light neural accelerators orchestrated via optical dataflow.
MIT Tagged-Token Dataflow Architecture
f(x) = Id dataflow semantics with tag-based dynamic scheduling
MIT Tagged-Token Dataflow Architecture pairs high-performance scheduling with tagged token contexts that encode activation frames, letting distributed execution units match tokens, dispatch operands, ...
Manchester Dataflow Machine
f(x) = fine-grained token-driven computation
The Manchester Dataflow Machine concept of the 1970s emphasized token-based dataflow execution with tokens flowing through FIFO routers and firing operations out of order as soon as operands arrived, ...
Normal Computing stochastic processing units
f(x) = Unconventional analog thermodynamic inference
Normal Computing's stochastic processing units leverage probabilistic analog circuits with thermodynamic noise shaping and memristive elements to accelerate AI inference workloads while embracing phys...
SambaNova RDU (Reconfigurable Dataflow Unit)
f(x) = AI training and inference dataflow graphs
Reconfigurable Dataflow Units implement granular dataflow graphs by combining configurable tiles with per-tile scheduling and streaming data paths. Each tile bundles compute arrays, SRAM buffers, and ...
Tenstorrent Grayskull
f(x) = AI training acceleration
Tenstorrent Grayskull is a tile-based architecture of compute tiles with systolic arrays paired with on-tile high-bandwidth memory to deliver massive data-parallel tensor math and training throughput ...
Tenstorrent Wormhole
f(x) = AI accelerator
Tenstorrent Wormhole is a multi-chip module designed for large language models, providing a high-bandwidth interconnect and integration with the Tenstorrent software stack.
UPMEM PIM
f(x) = parallel search and graph analytics near memory
UPMEM Processing-In-Memory DIMMs combine DRAM banks with embedded RISC DPUs, enabling data-center scale parallel search and graph analytics without moving data back to host CPUs.